Skip to content

[ENH] basic setar-tree module and tests #2890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

TinaJin0228
Copy link
Contributor

Reference Issues/PRs

#2816.

What does this implement/fix? Explain your changes.

In this commit, the major functions for the setar-tree algorithm is implemented, and the basic test file to test them.

Does your contribution introduce a new dependency? If yes, which one?

No.

Any other comments?

This is the initial commit for #2816.

Next steps:

  • Elaborate on the SETAR-tree algorithm, including an accelerated grid-search for finding the optimal split.
  • Add tests for corner cases and real-world datasets.
  • Write documentation and update the API reference page.
  • Explore implementing separate TAR and SETAR forecasters.

PR checklist

For all contributions
  • I've added myself to the list of contributors. Alternatively, you can use the @all-contributors bot to do this for you after the PR has been merged.
  • The PR title starts with either [ENH], [MNT], [DOC], [BUG], [REF], [DEP] or [GOV] indicating whether the PR topic is related to enhancement, maintenance, documentation, bugs, refactoring, deprecation or governance.
For new estimators and functions
  • I've added the estimator/function to the online API documentation.
  • (OPTIONAL) I've added myself as a __maintainer__ at the top of relevant files and want to be contacted regarding its maintenance. Unmaintained files may be removed. This is for the full file, and you should not add yourself if you are just making minor changes or do not want to help maintain its contents.
For developers with write access
  • (OPTIONAL) I've updated aeon's CODEOWNERS to receive notifications about future changes to these files.

@aeon-actions-bot aeon-actions-bot bot added enhancement New feature, improvement request or other non-bug code enhancement forecasting Forecasting package labels Jun 9, 2025
@aeon-actions-bot
Copy link
Contributor

Thank you for contributing to aeon

I have added the following labels to this PR based on the title: [ enhancement ].
I have added the following labels to this PR based on the changes made: [ forecasting ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

  • Run pre-commit checks for all files
  • Run mypy typecheck tests
  • Run all pytest tests and configurations
  • Run all notebook example tests
  • Run numba-disabled codecov tests
  • Stop automatic pre-commit fixes (always disabled for drafts)
  • Disable numba cache loading
  • Push an empty commit to re-run CI checks

@MatthewMiddlehurst MatthewMiddlehurst marked this pull request as draft June 9, 2025 10:05
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, more to come.

Comment on lines +26 to +30
Parameters
----------
lag : int, default=10
The maximum number of past lags to consider for both the AR models
and as the thresholding variable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should document the horizon here also, you can reuse the text from other classes..

lag : int, default=10
The maximum number of past lags to consider for both the AR models
and as the thresholding variable.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add an example usage of the class to the docstring here.

Comment on lines +53 to +55

for _lag in range(self.lag, 0, -1):
if len(y) <= _lag:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just lag should be fine for this loop, no _ needed as its not an attribute.

Comment on lines +8 to +10

class SetarForecaster(BaseForecaster):
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a test file for this class and add some functions. If you could generate some expected results from another implementation to ensure correctness that would be good as well.

Comment on lines +8 to +10

class SetarForecaster(BaseForecaster):
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the class name should just be SETAR. I do not think this is used anywhere other than forecasting, and we can easily change if it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the regular SETAR to the init as well.

Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good start. Could you post here on what your current plans are for testing correctness i.e. data used and results/implementation to compare against.

You mentioned previously having some issues with implementing global methods using the current framework. Could you also post that here?

Please add the classes to the API documentation in docs/

Comment on lines +12 to +14

class SetartreeForecaster(BaseForecaster):
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SETARTree is Bette as a name. Same reason as SETAR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also modify SetarforestForecaster to SETARForest

Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering where we can improve efficiency using numba. Do you know where most time is spent processing in the current implementation?

@TinaJin0228
Copy link
Contributor Author

Current plan for testing:
Several experiment results are in the blog (https://medium.com/@jintina48/gsoc-experiment-record-f0aa3bd82c18).
The function of fitting and forecasting is ok, but there is still a significant gap of the results, so my current plan is:

  1. Carefully compare my implementation to that of the R codebase, especially the splitting function and the error calculator (which, in my opinion, is the most likely one to have flaws)(though I need more time to locate them)
  2. Insist on first testing on the Chaotic dataset, since it is the simplest and uniform dataset in the paper's evaluation benchmark.

Issues implementing global methods using the current framework:
The current Aeon framework treats multiple time series input as multivariate time series.
Current temporary solution is to pass one time series to the input "x" and others to the exogenous variable, which is a quick fix.

As for the efficiency:
I think the primary computational bottleneck is the find_optimal_split function, which is the major step in building the tree.

def __init__(
self,
lag: int = 10,
horizon: int = 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, set horizon tag to false and dont p[ass the horizon


return self

def _predict(self, y=None, exog=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont need to check is fitted here, its done in the base class


predictions = []

for _ in range(self.horizon):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont heed a horizon here. predict should simply predict one ahead based on y. Further prediction horizon predictions are made with the iterative_forecast()

@TinaJin0228
Copy link
Contributor Author

Progress: I’ve successfully reproduced the results from the paper (with a tiny gap) on the Chaotic dataset, using an independent Python implementation. I’ll update the Aeon branch once I’ve confirmed that the method works within the Aeon framework. Together be done alongside the modification requests mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature, improvement request or other non-bug code enhancement forecasting Forecasting package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants